-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single mode for multi column group by -- Almost 2x for ClickBench Q32 #11792
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
let mut ctx = create_context(batches, Arc::clone(&schema)).unwrap(); | ||
b.iter(|| block_on(query(&mut ctx, "select a, b, count(*) from t group by a, b order by count(*) desc limit 10"))) | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gnuplot not found, using plotters backend
benchmark high cardinality
time: [273.00 ms 350.24 ms 451.67 ms]
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
benchmark low cardinality
time: [15.764 ms 16.531 ms 17.145 ms]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main branch
Gnuplot not found, using plotters backend
Benchmarking benchmark high cardinality: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 11.0s.
benchmark high cardinality
time: [565.74 ms 964.40 ms 1.3864 s]
change: [+69.461% +175.35% +332.28%] (p = 0.01 < 0.05)
Performance has regressed.
benchmark low cardinality
time: [10.736 ms 11.140 ms 11.577 ms]
change: [-35.106% -31.565% -28.118%] (p = 0.00 < 0.05)
Performance has improved.
This PR has slightly regression for low cardinality but huge gain for high cardinality
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Thanks @jayzhan211 -- this looks quite interesting,. I will try and study this PR carefully early this week |
I am running some benchmarks on this one |
I also measured some non trivial improvements (16 core) along with some slowdowns
It seems to have hurt TPCH a bit more:
Maybe we can do some profiling and figure out if there is some way to get back the performance on the queries it is slower for |
Here are some more thoughts about why this approach works and an alternate idea of how we can make these queries all faster: #11680 (comment) |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Production ready PR
Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
This change (should only effect multi column group by query) has nothing to do with QQuery 24, it should be considered noise
TODO
This change only change partial/final to single mode. Repartition is not included inside the group by operator.
Next step is to find out whether including repartition inside group by is helpful or not